ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping
نویسندگان
چکیده
UNLABELLED Sequencing reads generated by RNA-sequencing (RNA-seq) must first be mapped back to the genome through alignment before they can be further analyzed. Current fast and memory-saving short-read mappers could give us a quick view of the transcriptome. However, they are neither designed for reads that span across splice junctions nor for repetitive reads, which can be mapped to multiple locations in the genome (multi-reads). Here, we describe a new software package: ABMapper, which is specifically designed for exploring all putative locations of reads that are mapped to splice junctions or repetitive in nature. AVAILABILITY AND IMPLEMENTATION The software is freely available at: http://abmapper.sourceforge.net/. The software is written in C++ and PERL. It runs on all major platforms and operating systems including Windows, Mac OS X and LINUX.
منابع مشابه
Keyword-driven Suffix Arrays for On-line Keyword Searching from Documents in Chinese
On-line keyword searching from documents in Chinese tends to use inverted indexing as the main technique, which has its difficulties. Suffix Array is widely used for processing text in Western languages. However, it fails to get widely used in Chinese processing because of the speciality of Chinese. Suffix Array is a powerful tool. However it costs too much space. That is the major bottleneck o...
متن کاملessaMEM: finding maximal exact matches using enhanced sparse suffix arrays
We have developed essaMEM, a tool for finding maximal exact matches that can be used in genome comparison and read mapping. essaMEM enhances an existing sparse suffix array implementation with a sparse child array. Tests indicate that the enhanced algorithm for finding maximal exact matches is much faster, while maintaining the same memory footprint. In this way, sparse suffix arrays remain com...
متن کاملPSISA: An Algorithm for Indexing and Searching Protein Structure using Suffix Arrays
Protein Structure Indexing using Suffix Array (PSISA) is a new technique provides the ability to retrieve similarities of proteins based on the proteins structures. Indexing the protein structure is one approach of searching for protein similarities. In this paper we developed our proposed technique based on novel use of suffix array. We start by converting protein structure into a sequence by ...
متن کاملGapped Suffix Arrays: a New Index Structure for Fast Approximate Matching
Approximate searching using an index is an important application in many fields. In this paper we introduce a new data structure called the gapped suffix array for approximate searching in the Hamming distance model. Building on the well known filtration approach for approximate searching, the use of the gapped suffix array can improve search speed by avoiding the merging of position lists.
متن کاملImproving Exact Search of Multiple Patterns From a Compressed Suffix Array
Self-indexes are largely studied and widely applied structures in string matching. However, the exact matching of multiple patterns using self-indexes is a topic that has not been the subject of concentrated study although it is an area that may have direct and indirect applications and uses in fields such as bioinformatics. This paper presents a method of improving the exact search of multiple...
متن کامل